This is a Jupyter Notebook. The code you see is written in Python 3.
Audio samples in this Notebook are to be played via headphones or earphones to enable perception of binaural sound localization cues.
When you open the Notebook in a suitable environment that renders it correctly, the output <Audio> elements should be shown in your browser right away as they are saved together with the Notebook.
If you have no Python/Jupyter installed on your computer, just visit:
A "static" export of this notebook is also available (but may be removed without further notice in the future):
https://elpy.de/class/Sys_Audio_Class.html
If you want to play around with the code and change parameters, there are two alternatives:
Requires: Google account
import near the beginning) once, you are free to play around.Requires: Python 3 installed on your computer, plus at least numpy, jupyter and all of their dependencies. All of this comes bundles in the Anaconda Distribution. Make sure you have Python 3.6 or 3.7.
You can then download the Notebook file Sys_Audio_Class.ipynb from Github, place it in some directory where you find it in Jupyter and run it as usual.
In between the Python Code cells (each having an input area with code and, possibly, an output area), there may be text areas (like this one). These are Markdown cells. If you accidentally start editing one of these, is looks less like nicely formatted text but more like a colored text editor. You can "execute" those cells in order to recover the formatted version.
There are two functions noise() and tone() which are used to generate the examples below.
Both functions display a small Audio element that you can use to play the generated sounds.
To use one of the functions, just write their name followed imidiately by parenthesis, like this:
noise()
or
tone()
You can change parameters to non-default values by indicating this inside of the parenthesis. For example a tone with 1000 Hz instead of the default 440 Hz is generated like this:
tone(freq = 1000)
Do not include the units of the parameters. Only numbers are allowed. Decimals like 123.45 are usually allowed as well.
All parameter names are case sensitive (use all-lower-case).
| parameter | default value | description |
|---|---|---|
| min_freq | 20 | Lower end of the frequency range (pass-band) of the noise in Hz, must be positive. |
| max_freq | 20000 | Upper end of the frequency range (pass-band) of the noise in Hz, must be greater than min_freq and should not be above 24000. |
| itd | 0 | Interaural time difference in microseconds (µs). |
| ild | 0 | Interaural level difference in decibel (dB). |
| bc | +100 | Binaural correlation in percent, between -100 and +100. |
| length | 1.0 | Length of the signal in seconds, keep it reasonable. |
| level | 0 | Relative level in dB, can be negative or positive to decrease or increase the level. If you see a warning indicating that clipping occurred, the level is too high. The maximum value also depends on the actual ILD (and vice versa). |
For each parameter that you don't indicate explicitely, the default value will be used. In other words, using the noise() function without parameters is equivalent to (explicitely) setting all parameters to their default values:
noise(min_freq = 20, max_freq = 20000, itd = 0, ild = 0, bc = 100, length = 1.0, level = 0)
| parameter | default value | description |
|---|---|---|
| freq | 440 | Frequency of the tone in Hz. |
| itd | 0 | Interaural time difference in microseconds (µs). |
| ild | 0 | Interaural level difference in decibel (dB). |
| length | 1.0 | Length of the signal in seconds, keep it reasonable. |
| level | 0 | Relative level in dB, can be negative or positive to decrease or increase the level. If you |
tone(freq = 440, itd = 0, ild = 0, bc = 100, length = 1.0, level = 0)
The other parameters of noise() and tone(), such as ramp_dur, pad_length, show_player and return_signal, are not actually needed to be changed in the sense of this class. But you are free to do so... maybe to learn more about Python.
#####################################################
## This code cell must be executed once before the ##
## others. It defines all the necessary functions. ##
#####################################################
import numpy as np
from IPython.display import Audio, HTML
import struct
from io import BytesIO
import wave
# Default sampling rate:
samplingrate = 48000
def db2amp(d):
return 10 ** (float(d) / 20)
def amp2db(f):
return 20 * np.log10(f)
def us2s(microseconds):
return microseconds * 1e-6
def s2us(seconds):
return seconds * 1e6
def ms2s(m_seconds):
return m_seconds * 1e-3
def s2ms(seconds):
return seconds * 1e3
def times(signal):
if isinstance(signal, (tuple, list)):
signal = signal[0]
return np.arange(signal.size)/samplingrate
def signal_energy(signal, fs=100e3):
#signal = signal.astype(np.float64)
#print np.min(signal), np.max(signal)
return np.sum(signal.astype(np.float64) ** 2) / fs
def signal_power(signal, fs=100e3):
return signal_energy(signal, fs=fs) / signal.size
def band_limits(center=8500, octave_width=1.0):
return float(center) * 2**(-.5*octave_width), float(center) * 2**(+.5*octave_width)
def to_stereo(signal):
"""Return a tuple of left and right signal
A single signal will be duplicated.
A stereo signal will be returned unchanged.
"""
try:
signal_L, signal_R = signal
#print "Play: Stereo"
except ValueError:
signal_L = signal
signal_R = signal
#print "Play: Mono"
return (signal_L, signal_R)
def ramp(x = None, start = 0.0, stop = 1.0, ramp_dur = .05, fs = None, amp = 1.0):
"""ramp(x = np.ones(fs), start = .400, stop = .500, ramp_dur = .05, fs = fs, amp = 1.0)
Ramp the signal x by linaer ramps between start and stop
Examples
========
>>> ramp()
returns the default ramp function as it generates internally data x as all ones for 1 s with fs.
"""
if fs is None:
fs = samplingrate
if x is None:
x = np.ones(int(fs))
t1 = start
t2 = (start + ramp_dur)
t3 = (stop - ramp_dur)
t4 = stop
t = np.arange(x.size) / fs
r = amp * np.piecewise(t, [(t1 <= t) & (t < t2), (t2 <= t) & (t < t3), (t3 <= t) & (t < t4)],
[lambda t: (t - t1) / (t2-t1), 1, lambda t: 1 - (t - t3) / (t4-t3), 0])
return r * x
def bandpass_itd_noise(min_freq = 20, max_freq = 20000, itd=0, fs=None, samples = None):
"""bandpass_itd_noise()
"""
# Calculate available fft frequencies:
if fs is None:
fs = samplingrate
if samples is None:
samples = int(fs)
freqs = np.fft.rfftfreq(samples, 1/fs)
f = np.zeros_like(freqs)
idx = np.where(np.logical_and(freqs>=min_freq, freqs<=max_freq))[0]
f[idx] = 1 * np.sqrt(.5 * samples) * (f.size / np.count_nonzero(idx))**.5
f = np.array(f, dtype='complex')
phases_rad = np.random.rand(len(f)) * 2 * np.pi
phases = np.cos(phases_rad) + 1j * np.sin(phases_rad)
itd_phas_rad = us2s(itd) * np.pi * freqs # * .5 * 2
shift_left = np.cos(-itd_phas_rad) + 1j * np.sin(-itd_phas_rad)
shift_right = np.cos(+itd_phas_rad) + 1j * np.sin(+itd_phas_rad)
phases_left = phases * shift_left
phases_right = phases * shift_right
f_left = f * phases_left
f_right = f * phases_right
s_left = np.fft.irfft(f_left).real
s_right = np.fft.irfft(f_right).real
return s_left, s_right
def do_ild(signal, ild = 0):
signal_L, signal_R = to_stereo(signal)
amp_L = db2amp(-.5 * ild)
amp_R = db2amp(+.5 * ild)
return amp_L * signal_L, amp_R * signal_R
def do_bc(signal, bc = 100, **noise_kwargs):
signal_L, signal_R = to_stereo(signal)
if bc != 100:
if bc < 0:
signal_R = -signal_R
r = abs(bc / 100)
_, decorr_noise_L = bandpass_itd_noise(itd=0, **noise_kwargs)
_, decorr_noise_R = bandpass_itd_noise(itd=0, **noise_kwargs)
# Adjust powers:
decorr_noise_L = decorr_noise_L / (signal_power(decorr_noise_L) / signal_power(signal_L))**.5
decorr_noise_R = decorr_noise_R / (signal_power(decorr_noise_R) / signal_power(signal_R))**.5
signal_L = r**.5 * signal_L + (1 - r)**.5 * decorr_noise_L
signal_R = r**.5 * signal_R + (1 - r)**.5 * decorr_noise_R
return signal_L, signal_R
def stack(*signals):
n = len(signals)
signal_stack = tuple(np.sum(np.stack([s[c] for s in signals]), axis=0) / np.sqrt(n) for c in (0, 1))
return signal_stack
def pscale(signal, rel_power=1.0):
scaled_signal = tuple(signal[c] * np.sqrt(rel_power) for c in (0, 1))
return scaled_signal
def player(signal, rel_abi=0):
# Make it stereo and stack in one
signal = np.vstack(to_stereo(signal))
scaled = signal.T.ravel() * 1024 * db2amp(rel_abi)
if np.max(np.abs(scaled)) > 32767:
scale_by = 32767 / np.max(np.abs(scaled))
scaled = scaled * scale_by
import sys
print(f"Signal would have clipped. Scaled (down) by {amp2db(scale_by):.1g} dB.", file=sys.stderr)
fp = BytesIO()
waveobj = wave.open(fp,mode='wb')
waveobj.setnchannels(2)
waveobj.setframerate(samplingrate)
waveobj.setsampwidth(2)
waveobj.setcomptype('NONE','NONE')
waveobj.writeframes(b''.join([struct.pack('<h',x) for x in np.int16(scaled)]))
pmc_data = fp.getvalue()
waveobj.close()
display(Audio(pmc_data, rate=samplingrate, autoplay=False))
def save_wave(signal, filename, rel_abi=0):
# Make it stereo and stack in one
signal = np.vstack(to_stereo(signal))
scaled = signal.T.ravel() * 1024 * db2amp(rel_abi)
if np.max(np.abs(scaled)) > 32767:
scale_by = 32767 / np.max(np.abs(scaled))
scaled = scaled * scale_by
import sys
print(f"Signal would have clipped. Scaled (down) by {amp2db(scale_by):.1g} dB.", file=sys.stderr)
with open(filename, 'wb') as fp:
waveobj = wave.open(fp,mode='wb')
waveobj.setnchannels(2)
waveobj.setframerate(samplingrate)
waveobj.setsampwidth(2)
waveobj.setcomptype('NONE','NONE')
waveobj.writeframes(b''.join([struct.pack('<h',x) for x in np.int16(scaled)]))
waveobj.close()
def noise(min_freq = 20, max_freq = 20000, itd=0, ild=0, bc=100, length=1.0, level=0, ramp_dur=0.05, pad_length=0.005, show_player=True, return_signal=False):
samples = int((length + 2 * pad_length) * samplingrate)
signal_LR = bandpass_itd_noise(min_freq, max_freq, itd, samplingrate, samples)
signal_LR = do_bc(signal_LR, bc=bc, min_freq=min_freq, max_freq=max_freq, fs=samplingrate, samples=samples)
signal_LR = do_ild(signal_LR, ild)
signal_LR = [ramp(s, start=pad_length, stop=length+pad_length, ramp_dur=ramp_dur) for s in signal_LR]
if show_player:
display(HTML(f'<p><strong>Noise</strong> {min_freq}-{max_freq} Hz, '
f'ITD = {itd} μs, ILD = {ild} dB, BC = {bc}%'
'</p><style type="text/css">audio {min-width: 400px; }</style>'))
player(signal_LR, rel_abi=level)
if return_signal:
return signal_LR
def tone(freq = 440, itd=0, ild=0, bc=100, length=1.0, level=0, ramp_dur=0.05, pad_length=0.005, show_player=True, return_signal=False, **bc_noise_kwargs):
samples = int((length + 2 * pad_length) * samplingrate)
t = np.arange(0, samples) / samplingrate
phase = np.random.rand(1) * 2 * np.pi
shift = us2s(itd) * np.pi * freq
signal_LR = (
np.sin(t * 2 * np.pi * freq + phase - shift),
np.sin(t * 2 * np.pi * freq + phase + shift)
)
signal_LR = do_bc(signal_LR, bc=bc, fs=samplingrate, samples=samples, **bc_noise_kwargs)
signal_LR = do_ild(signal_LR, ild)
signal_LR = [ramp(s, start=pad_length, stop=length+pad_length, ramp_dur=ramp_dur) for s in signal_LR]
if show_player:
display(HTML(f'<p><strong>Tone</strong> {freq} Hz, '
f'ITD = {itd} μs, ILD = {ild} dB, BC = {bc}%'
'</p><style type="text/css">audio {min-width: 400px; }</style>'))
player(signal_LR, rel_abi=level)
if return_signal:
return signal_LR
def tone_stack(freqs = np.arange(500, 12000, 1000), show_player=True, return_signal=False, show_single_player=False, **tone_kwargs):
tone_kwargs.update(show_player=show_single_player)
tone_kwargs.update(return_signal=True)
signal_LR = stack(*[tone(freq=f, **tone_kwargs) for f in freqs])
if show_player:
display(HTML(f'<p><strong>Tone-Stack</strong> {freqs} Hz, '
f'ITD = {tone_kwargs.get("itd",0)} μs, ILD = {tone_kwargs.get("ild",0)} dB, BC = {tone_kwargs.get("bc",100)}%'
'</p><style type="text/css">audio {min-width: 400px; }</style>'))
player(signal_LR)
if return_signal:
return signal_LR
The following two cells include each several 0.5-second (500ms) broadband noise samples.
In the first cell, the interaural time differnce (ITD) was varied.
By convention, negative ITDs indicate that the sound presented to the left ear is leading, positive ITDs indicate that the sound presented to the right ear is leading.
This kind of sound that include "only" ITD information is usually perceived "inside the head", i.e. it is internalized. However, sound with different ITDs should be clearly shifted inside the head according to the ITD value.
Listen to the different sound samples - it's easiest to start with the one that has ITD = 0 µs. Check your own ITD range and correlate perceptual effects with ITD values.
for itd in [-4800, -2400, -1200, -900, -750, -600, -450, -300, -150, 0, 150, 300, 450, 600, 750, 900, 1200, 2400, 4800]:
noise(itd=itd, length=.5)
In the second cell, the interaural level differnce (ILD) was varied.
By convention, negative ILDs indicate that the sound presented to the left ear is louder, positive ILDs indicate that the sound presented to the right ear is louder.
Like ITD-only sounds, ILD-only sounds are internalized. The most extreme ILD values may actually sound like noise is presented to one ear alone.
for ild in [-40, -30, -20, -15, -10, -5, 0, 5, 10, 15, 20, 30, 40]:
noise(ild=ild, length=.5)
The unambiguous detection of ITD depends on the frequencies available in a sound stimulus.
First, there is an upper limit of frequencies above which auditory nerve fibers cannot pass time (or phase) information. This limit is different between species.
Listen to the following pure tones, all at an ITD of +600 μs (right), with increasing frequencies.
Up to which frequency do you perceive the tone at the right side? At which frequencies do you perceive it more in the center or not localized at all?
for freq in [150, 300, 450, 600, 750, 900, 1050, 1200, 1350, 1500, 1800, 2100, 2400, 2700, 3000, 4000, 6000, 8000]:
tone(freq=freq, itd=600, length=.5)
We can further explore our (lacking) capability of the to detect ITDs at high frequencies.
Let's first use some tones and narrow band noises in the low frequency range, i.e. at or around 500 Hz.
You should be able to localize these toward the left/right depending on the individual ITD.
cf = 500 # frequency or center frequency in Hz
for itd in [-500, -250, 0, 250, 500]:
tone(freq=cf, itd=itd, length=.5)
bw = 400 # bandwidth in Hz
for itd in [-500, -250, 0, 250, 500]:
noise(min_freq=cf-bw/2, max_freq=cf+bw/2, itd=itd, length=.5)
Using a higher (center) frequency, removes our capability to detect ITDs. Here we use 7000 Hz (7 kHz).
cf = 7000 # frequency or center frequency in Hz
for itd in [-500, -250, 0, 250, 500]:
tone(freq=cf, itd=itd, length=.5)
bw = 400 # bandwidth in Hz
for itd in [-500, -250, 0, 250, 500]:
noise(min_freq=cf-bw/2, max_freq=cf+bw/2, itd=itd, length=.5)
An interesting effect arises when we use a combination of frequencies and ITDs that leads to "false" impressions.
To experience this effect, we use a (center) frequency of 714 Hz - we've seen above that ITDs can be detected at this frequency range. You can try this here again. Start at ITD=0, then progressively go to higher ITDs.
What to you experience when you reach ITDs close to the period $T={1 \over f}$ of this frequency? Compare this to the negative ITDs.
Explanation (read after trying yourself): The periodic nature of tones leads to a "falsy" detection of ITD when the absolute values of ITD reach the period corresponding to a tone's frequency. When two tones (sine waves) of the same frequency are cross-correlated to determine the time lag between them (which is equal to the ITD), multiple ambiguous solutions are possible, apart from each other by the the period of the used frequency. The actual percept that arises is biased toward the front, i.e. the smallest absolute ITD value is assumed. For a frequency of 714 Hz, this mean that ITD = 0 μs and ITD = +1400 μs are effectively the same, as are -200 and 1200 μs, -400 and +1000 μs, -700 and +700 μs, and so on...
f = 714
print(f"The period of f = {f} Hz is approx. T = {1/f*1e6:.0f} μs")
for itd in [-400, -200, 0, 200, 400, 600, 800, 1000, 1200, 1400, 1600]:
tone(freq=f, itd=itd)
However, the ambiguities observed in pure tones and narrow band noises can be cancelled out by increasing the bandwidth, i.e. by "adding" more frequencies to the signal.
The next samples all use ITD = 1200 μs and a center frequency of 714 Hz. Start at the smallest bandwidth (here 1 Hz is effectively only a tone due to the method used to generate the stimuli) and experience how the perceived location of the stimulus changes with increasing bandwidth.
cf = 714
for bw in [1, 50, 150, 250, 350, 500, 800, 1000]:
print(f"Bandwidth: {bw} Hz")
noise(min_freq=cf-bw/2, max_freq=cf+bw/2, itd=1200, ramp_dur=.05)
We have seen before that it is not possible to detect ITDs at high frequency tones or even narrowband noises containing a range of high frequencies.
If we use a wide enough range of frequencies, however, the so called envelope (random changes in sound intensity) which always has a lower frequency, may be used to detect ITD.
Let's test this with just two ITDs: -400 μs (left) and +400 μs (right). Verify that you cannot hear any difference between the two ITDs of tones at 4 kHz and 8 kHz. Then see if the same is true for a noise containing all frequencies in between.
tone(freq=4000, itd=-400)
tone(freq=4000, itd=400)
tone(freq=8000, itd=-400)
tone(freq=8000, itd=400)
noise(min_freq=4000, max_freq=8000, itd=-400)
noise(min_freq=4000, max_freq=8000, itd=400)
As mentionned earlier, the mechanism to detect ITDs is similar to the cross-correlation. As we have seen, the success of this detection depends on the frequencies contained in the signal - in a perfect world.
In nature (and in the lab), the correlation between the two signals arriving at (or presented to) the two ears may be impaired. This could be due to other sounds arriving at the same time or due to noise that reaches the ears nearly independently. Neuronal noise may also play a role in deteriorating the signal quality before cross-correlation can be computed in the brain (e.g. via the Jeffress model in the avian brain).
We can mimick the effect by adding independent noise to the signals presented to both ears. This independent noise carries no ITD information. The amount of noise added is expressed as binaural correlation (BC) where 100% indicate a perfectly correlated signal and 0% indicate that only indepentent noise is presented.
Below, you can experience different levels of binaural correlation. Start with the first three examples (BC = 100%) and work your way down to lower BC values. What do you experience? Can you still localize the signal at low BC values (distiguish the three different ITDs)
for bc in [100, 80, 60, 40, 20, 0]:
print(f"Binaural Correlation {bc}%")
noise(itd=-400, bc=bc)
noise(itd= 0, bc=bc)
noise(itd= 400, bc=bc)
We have seen that ITD and ILD can both be used alone to shift the perceived location of a signal to the left or to the right - while the other value is kept at zero.
Natually, ITD and ILD would both vary depending on the sound source location and their effect would be combined.
In the current artificial situation (using headphonse/earphonse) where we can control ITD and ILD independently, it is possible to set them to contradictory values (i.e. combinations that wouldn't normally occur with real sound sources). This helps to determine how much influence each of these cues actually has on the perceived location of a sound and how they are combined.
Here is a first sound with ITD and ILD set to zero to give you again a reference how a central/frontal stimulus sounds like:
noise(itd=0, ild=0)
All of the following samples have an ITD set to +300 μs. At the same time, they have a different ILDs, starting at 0 dB and progressively decreasing toward -15 dB.
Keep in mind that positive values of ITD and ILD are associated with stimuli on the right, negative values on the left.
Decrease the ILD step-by-step to move the stimulus back to the center/front. Which ILD "cancels out" the right shift given by the fixed ITD?
Note that even the most central/frontal combination will unlikely sound exactly the same as the reference above.
for ild in [0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15]:
noise(itd=300, ild=ild)